Deep Ensemble Learning for Monaural Speech Separation

نویسندگان

Xiao-Lei Zhang

DeLiang Wang

چکیده

Monaural speech separation is a fundamental problem in robust speech processing. Recently, deep neural network (DNN) based speech separation methods, which predict either clean speech or an ideal time-frequency mask, have demonstrated remarkable performance improvement. However, a single DNN with a given window length does not leverage contextual information sufficiently, and the differences between the two optimization objectives are not well understood. In this paper, we propose to stack ensembles of DNNs, named multiresolution stacking, to address monaural speech separation. Each DNN in a module of the stack takes the concatenation of original acoustic features and expansion of the soft output of the lower module as its input, and predicts the ideal ratio mask of the target speaker. The DNNs in the same module explore different contexts by employing different window lengths. We have conducted extensive experiments with three speech corpora. The results demonstrate the effectiveness of the proposed method. We have also compared the two optimization objectives systematically and found that predicting the ideal time-frequency mask is more efficient in utilizing clean training speech, while predicting clean speech is less sensitive to SNR variations. Index Terms – Deep neural networks, ensemble learning, mapping-based separation, maskingbased separation, monaural speech separation, multi-resolution stacking.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-Target Ensemble Learning for Monaural Speech Separation

Speech separation can be formulated as a supervised learning problem where a machine is trained to cast the acoustic features of the noisy speech to a time-frequency mask, or the spectrum of the clean speech. These two categories of speech separation methods can be generally referred as the masking-based and the mapping-based methods, but none of them can perfectly estimate the clean speech, si...

متن کامل

Supervised Speech Separation Based on Deep Learning: An Overview

Speech separation is the task of separating target speech from background interference. Traditionally, speech separation is studied as a signal processing problem. A more recent approach formulates speech separation as a supervised learning problem, where the discriminative patterns of speech, speakers, and background noise are learned from training data. Over the past decade, many supervised s...

متن کامل

Deep Transform: Cocktail Party Source Separation via Probabilistic Re-Synthesis

In cocktail party listening scenarios, the human brain is able to separate competing speech signals. However, the signal processing implemented by the brain to perform cocktail party listening is not well understood. Here, we trained two separate convolutive autoencoder deep neural networks (DNN) to separate monaural and binaural mixtures of two concurrent speech streams. We then used these DNN...

متن کامل

A Feature Study for Masking-Based Reverberant Speech Separation

Monaural speech separation in reverberant conditions is very challenging. In masking-based separation, features extracted from speech mixtures are employed to predict a time-frequency mask. Robust feature extraction is crucial for the performance of supervised speech separation in adverse acoustic environments. Using objective speech intelligibility as the metric, we investigate a wide variety ...

متن کامل

Deep Transform: Cocktail Party Source Separation via Complex Convolution in a Deep Neural Network

Convolutional deep neural networks (DNN) are state of the art in many engineering problems but have not yet addressed the issue of how to deal with complex spectrograms. Here, we use circular statistics to provide a convenient probabilistic estimate of spectrogram phase in a complex convolutional DNN. In a typical cocktail party source separation scenario, we trained a convolutional DNN to re-s...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Deep Ensemble Learning for Monaural Speech Separation

نویسندگان

چکیده

منابع مشابه

Multi-Target Ensemble Learning for Monaural Speech Separation

Supervised Speech Separation Based on Deep Learning: An Overview

Deep Transform: Cocktail Party Source Separation via Probabilistic Re-Synthesis

A Feature Study for Masking-Based Reverberant Speech Separation

Deep Transform: Cocktail Party Source Separation via Complex Convolution in a Deep Neural Network

عنوان ژورنال:

اشتراک گذاری